NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

https://doi.org/10.1145/3495532

Gong, Yifan; Yuan, Geng; Zhan, Zheng; Niu, Wei; Li, Zhengang; Zhao, Pu; Cai, Yuxuan; Liu, Sijia; Ren, Bin; Lin, Xue; et al (September 2022, ACM Transactions on Design Automation of Electronic Systems)

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods—one -search based and the other is rule based—are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 \( \times \) and 1.73 \( \times \) DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.
more » « less
Full Text Available
Neural Pruning Search for Real-Time Object Detection of Autonomous Vehicles

Zhao, Pu; Yuan, Geng; Cai, Yuxuan; Niu, Wei; Liu, Qi; Wen, Wujie; Ren, Bin; Wang, Yanzhi; Lin, Xue (December 2021, Proc. ACM/IEEE 58th Design Automation Conference (DAC))
null (Ed.)
Full Text Available
Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

https://doi.org/10.1145/3528578

Yuan, Geng; Dong, Peiyan; Sun, Mengshu; Niu, Wei; Li, Zhengang; Cai, Yuxuan; Li, Yanyu; Liu, Jun; Jiang, Weiwen; Lin, Xue; et al (January 2022, ACM Transactions on Embedded Computing Systems)

Efficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Model compression strategies, including weight quantization and pruning, are widely recognized as effective approaches to significantly reduce computation and memory intensities, and have been implemented in many DNNs on edge devices. However, most state-of-the-art works focus on ad-hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different compression strategies. In this paper, we qualitatively and quantitatively compare the energy efficiency of FPGA-based and mobile-based DNN executions using mobile GPU and provide a detailed analysis. Based on the observations obtained from the analysis, we propose a unified optimization framework using block-based pruning to reduce the weight storage and accelerate the inference speed on mobile devices and FPGAs, achieving high hardware performance and energy-efficiency gain while maintaining accuracy.
more » « less
Full Text Available
Achieving Real-Time Object Detection on Mobile Devices with Neural Pruning Search

Zhao, Pu; Niu, Wei; Yuan, Geng; Cai, Yuxuan; Ren, Bin; Wang, Yanzhi; Lin, Xue (January 2021, The Fifth Workshop on Cognitive Architectures in Conjunction with HPCA 2021)
null (Ed.)
Full Text Available
Neural Pruning Search for Real-Time Object Detection of Autonomous Vehicles

https://doi.org/10.1109/DAC18074.2021.9586163

Zhao, Pu; Yuan, Geng; Cai, Yuxuan; Niu, Wei; Liu, Qi; Wen, Wujie; Ren, Bin; Wang, Yanzhi; Lin, Xue (January 2021, Proceedings of the 58th Design Automation Conference (DAC))
null (Ed.)
Full Text Available
Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search

https://doi.org/10.1109/RTAS52030.2021.00043

Zhao, Pu; Niu, Wei; Yuan, Geng; Cai, Yuxuan; Sung, Hsin-Hsuan; Liu, Shaoshan; Liu, Sijia; Shen, Xipeng; Ren, Bin; Wang, Yanzhi; et al (May 2021, IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS))
null (Ed.)
Full Text Available
Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI

https://doi.org/10.1109/ISQED51717.2021.9424332

Yuan, Geng; Liao, Zhiheng; Ma, Xiaolong; Cai, Yuxuan; Kong, Zhenglun; Shen, Xuan; Fu, Jingyan; Li, Zhengang; Zhang, Chengming; Peng, Hongwu; et al (April 2021, 22nd International Symposium on Quality Electronic Design (ISQED))
null (Ed.)
Full Text Available
NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

Li, Zhengang; Yuan, Geng; Niu, Wei; Zhao, Pu; Li, Yanyu; Cai, Yuxuan; Shen, Xuan; Zhan, Zheng; Kong, Zhenglun; Jin, Qing; et al (June 2021, IEEE Conference on Computer Vision and Pattern Recognition)
null (Ed.)
Full Text Available
NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

Li, Zhengang; Yuan, Geng; Niu, Wei; Zhao, Pu; Li, Yanyu; Cai, Yuxuan; Shen, Xuan; Zhan, Zheng; Kong, Zhenglun; Jin, Qing; et al (June 2021, IEEE Conference on Computer Vision and Pattern Recognition)
null (Ed.)
Full Text Available
TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators

https://doi.org/10.23919/DATE51398.2021.9474235

Yuan, Geng; Behnam, Payman; Cai, Yuxuan; Shafiee, Ali; Fu, Jingyan; Liao, Zhiheng; Li, Zhengang; Ma, Xiaolong; Deng, Jieren; Wang, Jinhui; et al (February 2021, Design, Automation & Test in Europe Conference & Exhibition (DATE))

As the number of weight parameters in deep neural networks (DNNs) continues growing, the demand for ultra-efficient DNN accelerators has motivated research on non-traditional architectures with emerging technologies. Resistive Random-Access Memory (ReRAM) crossbar has been utilized to perform insitu matrix-vector multiplication of DNNs. DNN weight pruning techniques have also been applied to ReRAM-based mixed-signal DNN accelerators, focusing on reducing weight storage and accelerating computation. However, the existing works capture very few peripheral circuits features such as Analog to Digital converters (ADCs) during the neural network design. Unfortunately, ADCs have become the main part of power consumption and area cost of current mixed-signal accelerators, and the large overhead of these peripheral circuits is not solved efficiently. To address this problem, we propose a novel weight pruning framework for ReRAM-based mixed-signal DNN accelerators, named TINYADC, which effectively reduces the required bits for ADC resolution and hence the overall area and power consumption of the accelerator without introducing any computational inaccuracy. Compared to state-of-the-art pruning work on the ImageNet dataset, TINYADC achieves 3.5× and 2.9× power and area reduction, respectively. TINYADC framework optimizes the throughput of state-of-the-art architecture design by 29% and 40% in terms of the throughput per unit of millimeter square and watt (GOPs/s×mm 2 and GOPs/w), respectively.
more » « less
Full Text Available

Search for: All records